Model Selection

Efficient Deployment with vLLM

# Efficient Deployment with vLLM

Deepseek R1 0528 Quantized.w4a16

The DeepSeek-R1-0528 model after quantization processing significantly reduces the requirements for GPU memory and disk space by quantizing the weights to the INT4 data type.

Large Language Model

Qwen2.5 VL 32B Instruct FP8 Dynamic

An FP8 quantized version based on the Qwen2.5-VL-32B-Instruct model, supporting visual-text input and text output, suitable for efficient inference scenarios.

Transformers English

Gemma 3 27b It FP8 Dynamic

This is a quantized version of google/gemma-3-27b-it. The weights are quantized using the FP8 data type. It is suitable for visual-text input and text output, and can perform inference with efficient deployment using vLLM.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase